A Proposal for EvaluatingAnswer Distillation from Web Data
نویسندگان
چکیده
Information retrieval systems can attempt to answer the user’s query directly, by extracting an appropriate passage of text from a corpus and presenting it on the results page. However, sometimes the passage of text contains extraneous information, or multiple passages are needed to form an answer. In cases like these, some sort of answer distillation system could be useful, taking as input the query and the answer-containing passage, and producing a succinct answer for presentation to the user. We formulate the problem of answer distillation as a sub-problem of machine comprehension and natural language generation, drawing techniques from neural machine learning, information retrieval, and natural language processing. To do well in answer distillation, we could benefit from a dataset consisting of many examples of query-passage pairs with their corresponding ”ground-truth” or distilled answers. We also need to have a metric to measure the quality of the distilled answers. In this paper we share our early ideas on building such a dataset and solicit feedback from the community. Our goal is to align our needs for an answer distillation dataset and the needs of future academic research in this space. In particular, we propose that having a large number of reference answers available per query would be beneficial, and consequently suggest extensions to metrics like BLEU and METEOR for the scenario where this is true.
منابع مشابه
Thesis Proposal Csc 2015 Simulation and Optimisation of a System Coupling Solar Collectors and Vacuum Membrane Distillation for Desalination
Scientific domain: %scientific_domain Subject short description: Water shortage and decline of the quality of fresh water resources make sea water desalination become a major way for drinking water production. Reverse Osmosis (RO) is now the main technology used for seawater desalination, but it shows a limited recovery and the environmental impact of large volumes of rejected brines is still a...
متن کاملSubsite Retrieval: A Novel Concept for Topic Distillation
Topic distillation is one of the main information needs when users search the Web. In previous approaches to topic distillation, the single page was treated as the basic searching unit. This strategy is inherited from general information retrieval, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named subsite r...
متن کاملRelevance Propagation for Topic Distillation UIUC TREC 2003 Web Track Experiments
In this paper, we report our experiments on the Web Track TREC-2003. We submitted five runs for the topic distillation task. Our goal was to evaluate the standard language modeling algorithms for topic distillation, as well as to explore the impact of combining link and content information. We proposed a new general relevance propagation model for combining link and content information, and exp...
متن کاملOverview of the TREC 2003 Web Track
The TREC 2003 web track consisted of both a non-interactive stream and an interactive stream. Both streams worked with the .GOV test collection. The non-interactive stream continued an investigation into the importance of homepages in Web ranking, via both a Topic Distillation task and a Navigational task. In the topic distillation task, systems were expected to return a list of the homepages o...
متن کاملLarge Scale Distributed Neural Network Training through Online Distillation
Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased testtime cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforwar...
متن کامل